Day 12: Introduction to Hugging Face Ecosystem
Hugging Face is often called the GitHub of the AI/ML community. You can share models, datasets, and demos, and use the Transformers library to work with any model in just a few lines of code.
Hugging Face Ecosystem Components
| Service | Role | URL |
|---|---|---|
| Hub (Models) | Repository for 500K+ models | huggingface.co/models |
| Hub (Datasets) | Repository for 100K+ datasets | huggingface.co/datasets |
| Spaces | Demo app hosting (Gradio, Streamlit) | huggingface.co/spaces |
| Transformers | Model loading/inference library | pip install transformers |
| Datasets | Dataset loading/processing library | pip install datasets |
| PEFT | Efficient fine-tuning (LoRA, etc.) | pip install peft |
| TRL | RLHF/DPO training library | pip install trl |
| Accelerate | Multi-GPU/TPU training | pip install accelerate |
Account Creation and Token Setup
# Step 1: Create an account at https://huggingface.co
# Step 2: Generate a token at https://huggingface.co/settings/tokens
# Step 3: Login from the terminal
# Method 1: CLI login
# pip install huggingface_hub
# huggingface-cli login
# Method 2: Login from Python
from huggingface_hub import login
login(token="hf_YOUR_TOKEN_HERE") # HF_TOKEN environment variable recommended
# Method 3: Environment variable (.env file)
# HF_TOKEN=hf_YOUR_TOKEN_HERE
Using Models with Transformers
# pip install transformers torch
from transformers import pipeline
# Sentiment analysis (done in one line!)
classifier = pipeline("sentiment-analysis")
result = classifier("I love learning about LLMs!")
print(result) # [{'label': 'POSITIVE', 'score': 0.9998}]
# Text generation
generator = pipeline("text-generation", model="gpt2")
output = generator("The future of AI is", max_new_tokens=30)
print(output[0]["generated_text"])
# Translation
translator = pipeline("translation_en_to_fr", model="Helsinki-NLP/opus-mt-en-fr")
result = translator("How are you today?")
print(result) # [{'translation_text': 'Comment allez-vous aujourd'hui ?'}]
# Question answering
qa = pipeline("question-answering")
result = qa(
question="What is Hugging Face?",
context="Hugging Face is a platform for sharing AI models and datasets.",
)
print(f"Answer: {result['answer']} (confidence: {result['score']:.2%})")
Datasets Library
# pip install datasets
from datasets import load_dataset
# Load a popular dataset (automatic download + caching)
dataset = load_dataset("squad", split="train[:100]")
print(f"Number of samples: {len(dataset)}")
print(f"Columns: {dataset.column_names}")
print(f"First example: {dataset[0]['question']}")
# Korean dataset
ko_dataset = load_dataset("kor_nlu", "sts", split="train[:50]")
print(f"\nKorean NLU data: {len(ko_dataset)} samples")
# Dataset preprocessing
def preprocess(example):
example["question_length"] = len(example["question"])
return example
processed = dataset.map(preprocess)
print(f"Average question length: {sum(processed['question_length']) / len(processed):.0f} chars")
Searching and Downloading Models from Hub
from huggingface_hub import HfApi, list_models
api = HfApi()
# Search for Korean models
models = list(api.list_models(
search="korean",
sort="downloads",
direction=-1,
limit=5,
))
print("Popular Korean-related models:")
for model in models:
print(f" {model.id} (downloads: {model.downloads:,})")
# Check specific model info
model_info = api.model_info("meta-llama/Meta-Llama-3.1-8B-Instruct")
print(f"\nModel: {model_info.id}")
print(f"Downloads: {model_info.downloads:,}")
print(f"Likes: {model_info.likes:,}")
print(f"Tags: {model_info.tags[:5]}")
Creating Demos with Hugging Face Spaces
Spaces is a service that hosts Gradio or Streamlit apps for free.
# pip install gradio
import gradio as gr
from transformers import pipeline
# Simple sentiment analysis demo
classifier = pipeline("sentiment-analysis")
def analyze_sentiment(text):
result = classifier(text)[0]
return f"{result['label']} (confidence: {result['score']:.2%})"
demo = gr.Interface(
fn=analyze_sentiment,
inputs=gr.Textbox(label="Text Input", placeholder="Enter a sentence to analyze"),
outputs=gr.Textbox(label="Sentiment Analysis Result"),
title="Sentiment Analysis Demo",
description="Analyzes the sentiment (positive/negative) of text.",
)
demo.launch()
# Deploy to Spaces: huggingface-cli repo create then git push
Hugging Face is essential infrastructure for LLM development. From model downloads to fine-tuning and deployment, everything can be handled within this ecosystem. Starting next week, we will use these tools to begin hands-on projects.
Today’s Exercises
- Create a Hugging Face account and obtain a token. Run GPT-2 using
pipeline("text-generation")and compare the generation results for Korean and English inputs. - Find and load a Korean dataset using the
datasetslibrary, then print the data structure and the first 5 samples. - Create a simple text summarization demo with Gradio. You can use
pipeline("summarization").